20  More Spatial Joins

Today we will be focusing on the practice of fancy geospatial data analysis and visualization.

20.1 Overview

Two quick things.

  • st_intersection (vs st_filter) for clipping to a boundary
  • summarizing statistics using clipped data

20.1.1 Libraries

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE

20.2 Data

I am going to use an in-class project example from the Farmlands group.

We’ll pull in California Counties for our boundary and farmlands for the dataset of interest.

County boundaries are available at the California open data portal.

Check the name you download it as

  • if a shapefile you point to the folder directory CA_Counties
  • if a geojson you point to the file itself

We’ll filter it just for Imperial County in this example of farmlands.

Imperial <- st_read(dsn = 'CA_Counties') |> 
  filter(NAME == 'Imperial') |> 
  st_transform(crs =4326)
Reading layer `CA_Counties_TIGER2016' from data source 
  `C:\Dev\EA078_Fall2023\CA_Counties' using driver `ESRI Shapefile'
Simple feature collection with 58 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -13857270 ymin: 3832931 xmax: -12705030 ymax: 5162404
Projected CRS: WGS 84 / Pseudo-Mercator

Let’s look at Figure 20.1 for Imperial County boundary to see if it worked.

leaflet() |> 
  addTiles() |> 
  addPolygons(data = Imperial,
              color = 'black',
              fillOpacity = 0.1)
Figure 20.1: Imperial County boundary

Now let’s load the Agland dataset.

Agland <- st_read('AgLand.geojson') |>  
  st_transform(crs=4326)
Reading layer `AgLand' from data source `C:\Dev\EA078_Fall2023\AgLand.geojson' using driver `GeoJSON'
Simple feature collection with 14729 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -119.478 ymin: 32.65257 xmax: -114.4952 ymax: 35.05472
Geodetic CRS:  WGS 84

20.3 Summary stats pre-clipping

Let’s make a barchart of the area of farmland in the SCAG planning area.

The Agland file has a variable called ‘Shape_Area’. Let’s summarize the total land in each category on a simple bar chart.

Figure 20.2

Agland |> 
  #remove geometry
  st_set_geometry(value=NULL) |> 
  group_by(TYPE) |> 
  summarize(sumArea = sum(Shape_Area)) |> 
  ggplot(aes(x = sumArea, y = TYPE)) +
  geom_col() +
  theme_bw() +
  scale_x_continuous(labels = scales::comma_format()) +
  labs(x = 'Area (units)', y = 'Category', title = 'Farmland Categories in SoCal')

Figure 20.2: Column chart of area of each farmland type in SoCal

Cool!

20.4 Clip Farmland data to Imperial County

Previously, I have showed st_filter as the function to use for spatial joins. The default type of spatial join in st_filter is classified as st_intersect.

Let’s show an example of st_intersect being applied to the Agland dataset to see how we might get a different result if we focus on a subset of the data.

When doing spatial joins, it can be important to turn off spherical geometry which is the first bit of code sf_use_s2(FALSE).

sf_use_s2(FALSE)
Spherical geometry (s2) switched off
Imp_ag <- Agland |> 
  st_intersection(Imperial)
although coordinates are longitude/latitude, st_intersection assumes that they
are planar
Warning: attribute variables are assumed to be spatially constant throughout
all geometries

Check the map in Figure 20.3 to see it worked.

palFarms <- colorFactor(palette = 'Set2', domain = Imp_ag$TYPE)

leaflet() |> 
  addProviderTiles(provider = providers$Esri.WorldImagery) |> 
  addPolygons(data = Imp_ag,
              color = ~palFarms(TYPE),
              weight = 1,
              fillOpacity = 0.8) |> 
  addLegend(data = Imp_ag,
            title = 'Farmland type',
            pal = palFarms,
            values = ~TYPE)
Figure 20.3: Agland clipped to Imperial County

Cool!

20.4.1 Summary Stats in Imperial County

Now let’s do a bar chart for just the Imperial County data.

Follow the same steps as in Figure 20.2 but apply them to Imp_ag.

Figure 20.4 shows the result.

Imp_ag |> 
  #remove geometry
  st_set_geometry(value=NULL) |> 
  group_by(TYPE) |> 
  summarize(sumArea = sum(Shape_Area)) |> 
  ggplot(aes(x = sumArea, y = TYPE)) +
  geom_col() +
  theme_bw() +
  scale_x_continuous(labels = scales::comma_format()) +
  labs(x = 'Area (units)', y = 'Category', title = 'Farmland Categories in Imperial County')

Figure 20.4: Column chart of area of each farmland type in SoCal